Text Compression using Recency Rank with Context and Relation to Context Sorting, Block Sorting and PPM*
نویسنده
چکیده
Recently block sorting compression scheme was developed and relation to statistical scheme was studied, but theoretical analysis of performance has not been studied well. Context sorting is a compression scheme based on context similarity and it is regarded as an online version of the block sorting and it is asymptotically optimal. However, the compression speed is slower and the real performance is not so better. In this paper, we propose a compression scheme using recency rank code with context (RRC), which is based on the context similarity. The proposed method encodes characters to recency ranks according to their contexts. It can be implemented using su x tree and the recency rank code is realized by move-to-front transformation of edges in the su x tree. It is faster than the context sorting and is also asymptotically optimal. The performance is improved by changing models according to the length of the context and by combining some characters into a code. However, it is still inferior to the block sorting in both performance and speed. We investigate the reason of bad performance and we also prove asymptotical optimality of a variation of the block sorting and make relation among the RRC, the context sorting, the block sorting and PPM* clear.
منابع مشابه
Enhanced Word-Based Block-Sorting Text Compression
The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing se...
متن کاملThe Context Trees of Block Sorting Compression
The Burrows-Wheeler transform (BWT) and block sorting compression are closely related to the context trees of PPM. The usual approach of treating BWT as merely a permutation is not able to fully exploit this relation. We show that an explicit context tree for BWT can be efficiently generating by taking a subset of the corresponding suffix tree, identify the central problems in exploiting its st...
متن کاملOn variants of block-sorting compression using context from both the left and right
The block-sorting text compression algorithm can be viewed as associating a context with each character to be compressed, and then sorting the characters on their contexts. Normally, the context associated with each character is the string to the left of the character. Recently, Ratushnyak suggested that it might be possible instead to build a context by interleaving characters taken alternatel...
متن کاملOn variants of block-sorting compression using context from both the
The block-sorting text compression algorithm can be viewed as associating a context with each character to be compressed, and then sorting the characters on their contexts. Normally, the context associated with each character is the string to the left of the character. Recently, Ratushnyak suggested that it might be possible instead to build a context by interleaving characters taken alternatel...
متن کاملExperiments with a Block Sorting Text Compression Algorithm
This report presents some preliminary work on a recently described “Block Sorting”lossless or text compression algorithm. While having little apparent relationship toestablished techniques, it has a performance which places it definitely among the best-known compressors. The original paper did little more than present the algorithm, withstrong advice for efficient implementation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997